104        Bioinformatics

of orthologs (OrthoDB) to measure genome assembly and annotation completeness. The

quality of a genome assembly is described by some metrics including C for complete, D

for duplicate, F for fragmented, M for missing, and n for the number of genes used for the

assessment. The genes recovered from the de novo assembly are reported as complete (C)

when their lengths are within two standard deviations of the mean length of genes on the

ortholog database. Multiple copies of complete genes are reported as duplicate (D), which

is an indication of inaccuracy in the assembly of haplotypes. Incomplete or partially recov-

ered genes are reported as fragmented (F) and the unrecovered genes are reported missing

(M). The number of gene used (n) reflects the confidence of the assessment results.

BUSCO uses a number of third-party software packages that must be installed for the

program to run properly. The BUSCO dependencies include Python 3.x, BioPython, pan-

das, tBLASTn 2.2+, Augustus 3.2, Prodigal, Metaeuk, HMMER3.1+, SEPP, and R + ggplot2

for the plotting companion script. Some of these packages are needed in some cases. For

the complete installation instructions, visit the BUSCO website at “https://busco.ezlab.org/

busco_userguide.html”. You can run the following commands on the Linux command

line to install some BUSCO third-party dependencies and BUSCO software on Ubuntu:

sudo apt update && sudo apt upgrade

pip install biopython

pip install pandas

sudo apt-get install ncbi-blast+

sudo apt install augustus augustus-data augustus-doc

sudo apt install prodigal

sudo apt install hmmer

BUSCO software can be cloned and installed by running the following commands:

git clone https://gitlab.com/ezlab/busco.git

cd busco/

python3 setup.py install –user

FIGURE 3.12  Icarus contig browser displaying de novo assemblies aligned to a reference genome.